Utility And Metrics Of Natural Language Processing On Identifying Patients For Pharmacoepidemiologic Studies.

نویسندگان

  • N Bailey
  • A Wilson
  • A Kamauu
  • S DuVall
چکیده

Objective Electronic medical records (EMR) are increasingly utilized in clinical practice and research, allowing for more efficient availability of rich patient records. However, most use of EMR is limited to coded, structured, administrative data, while the vast majority of patient information (e.g. disease subtype, severity, medical device usage, etc.) is tied up in narrative clinical notes. The challenge remains in accessing the information in these patient notes. Historically this has been done via timely and costly manual chart review, but as the amount of EMR data increases exponentially, manual chart review become impractical and impossible. Advancements in Natural Language Processing (NLP) has demonstrated promising results in combining the capture of additional clinical note information with the efficiency of modern informatics. The objective of this study is to demonstrate the relevancy and utility of NLP to extract health data from EMR in real-world observational studies. Methods We conducted a systematic review and meta analysis of performance metrics for five (5) NLP-driven projects involving oncology, inflammation and medical devices, which had similar protocols and objectives. We assessed and validated the accuracy of NLP algorithms, as well as heterogeneity of accuracy between studies using random effects meta-analysis (represented by I2 value). Results A total of 382,523 patients were identified using NLP among the 5 studies. Accuracy among the studies ranged from 95.2% to 100% (95% CI: 95.1%, 100%), with an I2 value of 95.9% (95% CI: 92.9%, 97.7%). Conclusion NLP provides a unique opportunity to extract meaningful information from patient-level narrative clinical notes in EMR data sources with a high degree of accuracy. This provides additional rich sources of data from narrative clinical notes, that are otherwise not easily available, to support epidemiology and other realworld observational studies. ▶ We conducted a systematic review and meta analysis of performance metrics for five (5) NLP-driven projects involving oncology, inflammation, central nervous system disorders and medical devices, which had similar protocols and objectives.5-9 ▶ Figure 2 is a visual representation of the disease area, the project and the elements of the study that required NLP. The ANCA-Associated Vasculitides (AAV) project encompasses Granulomatosis with Polyangiitis (GPA), Microscopic Polyangiitis (MPA) and Churg-Strauss Syndrome (CSS), as described in Figure 3. ▶ We assessed and validated the accuracy of NLP algorithms, as well as heterogeneity of accuracy between studies using random effects meta-analysis (represented by I2 value). Conflicts of Interest: Authors N Bailey and A Wilson are employees of Anolinx, LLC. Author AWC Kamauu is an owner of Anolinx, LLC. The studies described in this poster were sponsored by Genentech, Inc., F. Hoffmann-La Roche, Ltd, and Shire Development Inc. through research contracts. ▶ Figure 3: Accuracy of NLP among 4 studies ▶ Electronic medical records (EMR) are increasingly utilized in clinical practice and research, allowing for more efficient availability of rich patient records.1 ▶ However, most use of EMR is limited to coded, structured, administrative data, while the vast majority of patient information (e.g. disease subtype, severity, medical device usage, etc.) is tied up in narrative clinical notes.1-4 ▶ Historically this has been done via timely and costly manual chart review, but as the amount of EMR data increases exponentially, manual chart review become impractical and impossible.3 ▶ Advancements in Natural Language Processing (NLP) has demonstrated promising results in combining the capture of additional clinical note information with the efficiency of modern informatics.4-9 ▶ A comparison of traditional abstraction of EMR data and NLPdriven data utilizing one of the five projects analyzed in this abstract can be found in Figure 1.5 B A C K G R O U N D ▶ NLP processes are often dynamic and iterative and can be difficult to standardize. ▶ By definition, the unstructured notes in electronic medical records are not held to a required systematic structure. It is therefore highly variable across and between health delivery systems further limiting the ability to standardize NLP processes. ▶ NLP must be carefully refined and validated for context & negation of diagnoses; so, for example, terms such as “patient does not have BED” are categorized accurately. ▶ Data available for NLP-driven extraction is limited to what the clinician documents in the clinical notes. O B J E C T I V E ▶ The objective of this study is to demonstrate the relevancy and utility of NLP to extract health data from EMR in real-world observational studies. U N I V E R S I T Y O F U TA H S C H O O L O F M E D I C I N E ▶ Figure 2: Disease area, Project and use of NLP for five (5) NLPdriven projects ▶ Figure 1: Comparison of tradi.onal and NLP-­‐driven methods of abstrac.ng diagnoses of ea.ng disorder not otherwise specified (EDNOS) and binge ea.ng disorder (BED) from electronic medical records (EMR) 5 1. Ananthakrishnan AN, Cai T, Savova G, et al. Improving Case Definition of Crohn’s Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing: A Novel Informatics Approach. Inflammatory bowel diseases. 2013;19(7):1411-1420. 2. Liddy ED, Turner AM, Bradley J. Modeling Interventions to Improve Access to Public Health Information. AMIA Annual Symposium Proceedings. 2003;2003:909. 3. Love TJ, Cai T, Karlson EW. Validation of psoriatic arthritis diagnoses in electronic medical records using natural language processing. Seminars in arthritis and rheumatism. 2011;40(5):413-420. 4. Murff HJ, FitzHenry F, Matheny ME, et al. Automated Identification of Postoperative Complications Within an Electronic Medical Record Using Natural Language Processing. JAMA. 2011;306(8):848-855. 5. Bellows B, LaFleur J, Kamauu AWC, et al. Automated identification of patients with a diagnosis of binge eating disorder from narrative electronic health records. J Am Med Inform Assoc. 2014 Feb;21(e1):e163-8. 6. DuVall SL, Kamauu AWC, Sauer BC. Using Advanced Healthcare Data Analytics via Validated Natural Language Processing to Identify and Characterize Central Venous Catheterization Episodes in the Veterans Affairs Electronic Health Records. ICPE Conference Proceedings, Aug 2012. 7. DuVall SL, Kamauu AWC, Napalkov P, et al. Using Natural Language Processing of Electronic Health Records to Identify Patients with ANCA-Associated Vasculitides in the Veterans Affairs. ICPE Conference Proceedings, Aug 2012. 8. DuVall SL, Patterson OV, Forbush TB, et al. Using Advanced Healthcare Data Analytics to Identify Patients with Advanced Basal Cell Carcinoma in a Large Nationwide Healthcare Institution. ACMS Conference Proceedings, May 2013. 9. DuVall SL, Butler J, LaFleur J, et al. Determining Multiple Sclerosis Subtype from Electronic Medical Records. ICPE Conference Proceedings, Aug 2013. aBCC = advanced basal cell carcinoma; BED = binge-eating disorder; AAV = ANCA-associated vasculitides

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Examination of Authors' Stylistic Elements of Electronic Messages based on Researched Studies

Identifying author is an important issue in natural language processing and text classification. It shows the author's characteristic in various texts. The rapid development of the Internet causes Web-based tools such as email and blogs with an anonymous identity become a popular method of communication for the perpetrators. Moreover, it creates some specific security issues. In this paper, we ...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

A Distributed Intelligent Agent Approach to Context in Information Retrieval

Information retrieval across disadvantaged networks requires intelligent agents that can make decisions about what to transmit in such a way as to minimize network performance impact while maximizing utility and quality of information (QOI). Specialized agents at the source need to process unstructured, ad-hoc queries, identifying both the context and the intent to determine the implied task. K...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

بررسی اعتبار و روایی آزمون زبان‌پریشی فارسی

Objectives: So far, The Persian Aphasia Battery (PAB) has been widely used by clinicians as the first clinical linguistic test to assess and rehabilitate aphasia and language impairments among adult Iranian brain damaged aphasic patients (Fluent / Non-Fluent). The first version was provided based on linguistic and cultural adaptations on healthy Persian speaking adults and has be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research

دوره 18 7  شماره 

صفحات  -

تاریخ انتشار 2015